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ABSTRACT 

Social media is used to analyze political campaigns, 
stock market, movies, medicines, agriculture etc. 
Twitter a microblogging website where users read and 
write millions of tweets on a variety of topics daily. 
This project attempts to analyze the sentiment of the 
people for the election candidates based upon the live 
opinions and emotions. The focus of our project is to 
assign the polarity to each tweet that is whether the 
user expresses a positive or negative opinion. With 
the tweets that are extracted, we try to find how 
frequent their emotions change. We are also trying to 
classify and differentiate the sentiments of the people 
before and after the election based on the tweets they 
upload. The location of the twitter user is used to 
classify the geographical area which in turn helps to 
analyze the emotions of people of different areas. Our 
project uses the Naive-Bayes approach in R language 
and R Studio for processing the textual data. 

Keywords: Twitter, R, Sentiment Analysis, Naive 
Bayes Algorithm 

INTRODUCTION: 

The social media sites which are used to make 
frequent posts in short publicly or in a group of users 
are called as the microblogging sites. These kinds of 
sites are hundreds in number. Therefore, there is a 
huge collection of data on these sites. The rate of data 
increases rapidly. These data express the moods and 
sentiments of the people in a large amount. The 
benefits of social media analytics include anticipation 
of business opportunities and competitive advantage. 
Cost efficiency is a major benefit of data analytics. 
Therefore, implementing the technology will result in 


positive advantages for organizations in a number of 
industries. Twitter is one of the largest social media 
platform and a microblogging site where numerous 
users upload data. Twitter allows its user to read and 
post Twitter messages of size 140 characters. The 
Twitter messages are called as tweets. Nearly 
656Million tweets are tweeted in a day. Users’ idea is 
expressed as tweets which in turn expresses their 
mood. Twitter makes these utterances to be available 
in a data stream, which can be mined using mining 
algorithms. In this paper, we discuss the analysis of 
the sentiment of the twitter users for an election based 
on the hashtags and emojis they use to tweet. 

Sentiment Analysis: 

Sentiment Analysis is the process of determining the 
self-indulgent feelings of the people, that is whether 
their opinion about something is positive, negative or 
neutral. Sentiment Analysis which is also known as 
Opinion Mining is used to derive the attitude of the 
Speaker. Sentiment Analysis of a social media has a 
variety of Applications such as marketing, reviewing, 
customer service etc., For example, movie reviews 
can be analysed and reports can be generated which 
can be used to decide how far the movie reached the 
people. 

Working Model: 

The system uses Naive Bayes Approach for text 
categorization. For the categorization of the text, 
Naive Bayes classifiers assume that the effect of a 
variable value on a given class is independent of the 
values of other variables. This assumption is called as 
conditional independence. In this paper, we a propose 
an approach involving both dictionary-based and 
corpus-based techniques which finds the semantic 
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orientation of the sentiments in the tweets. Emoticons, 
neutralization, negation handling and capitalization is 
also considered as they are the huge part of the 
modem internet language. To uncover the sentiments, 
we will first extract the opinion words from the tweets 
and then we find out their orientation, that is to decide 
whether the sentiment word reflects the feelings of the 
user. 

Class Prior Probability 

Likelihood 


Posterior Probability Predictor Prior Probability 


P(c | X) = | c)xP(x 2 | c)x ••• X P(.t n | c)x P(c) 

Fig 1. Navie Bayes Approach 


Krywoi-d Srxirli 



Fig 2. Working model 


The following steps will brief the process of the 
proposed system which is discussed in this paper 

1. Retrieval of tweets 

2. Pre-processing of extracted data 

3. Parallel processing 

4. Sentiment scoring module 

5. Output sentiment 

1. Retrieval of tweets: 

As Twitter is the most exaggerated part of social 
networking site, it consists of various blogs which are 
related to various topics worldwide. Instead of taking 
whole blogs, we will rather search on a particular 
topic and extract all the tweets related to that topic. 

2. Pre-processing of extracted data: 

After retrieval of tweets, Sentiment analysis tool is 
applied to raw tweets but in most of the cases results 
in very poor performance. Therefore, pre-processing 
techniques are necessary for obtaining better results. 


i) Filtering: 

Filtering is nothing but the cleaning of raw data. In 
this step, URL links (E.g. http://twitter.com), special 
words in twitter (e.g. “RT” which means Retweets), 
usernames in twitter (e.g. @kgk - @ symbol 
indicating a username) are removed and emoticons 
are replaced with special strings. 

ii) Tokenization: 

Tokenization is nothing but Segmentation of 
sentences. In this step, we will tokenize or segment 
text with the help of splitting text by spaces and 
punctuation marks to form a container of words. 

iii) Construction of n-grams: 

Set of n-grams can make out of consecutive words. 

Negation words such as “no”, “not” is attached to a 
word which follows or precedes it. For Instance: “I do 
not like jazz music” has two bigrams: “I do+not”, 
“do+not like”, “not+like jazz music”. Sothe accuracy 
of the classification improves by taking negation into 
account because it is a very common linguistic 
construction that affects polarity. 

3. Parallel Processing: 

Sentiment classifier which classifies the sentiments 
using Naive Bayes Classifier where every database 
has hidden information which can be used for 
decision-making. The set of models are found by 
classification and this model can be used to predict 
the class of objects. It is a two-step process where the 
first step is Model Construction, in which the model is 
built from the training set and the second step is 
Model Usage which is used to classifying new data. 

4. Sentiment Scoring Module: 

The basic feature of this model is Polarity of the 
words. A dictionary which contains a list of English 
words and score which ranges from 1 to 3. The 
Scoring module is used to determine the sentiment of 
the textual data. 


Polarity 

Sentiment 

i 

Negative 

2 

Neutral 

3 

Positive 


Table 1. Polarity Values 


The process involves the following steps: 5. Output the sentiment: 
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Based on the Score obtained from the previous 
module the system interprets whether the tweet is 
positive, neutral or negative and visualize the results 
with visualization models. 

Visualization: 

The details of the analysis and their visualizations are 
shown below. For visualization packages of R such as 
tm, reshape, gridextra, ggplot, wordcloud are used 
which can be obtained from different countries of the 
world. The analysis is based on the computation of 
these tweets. 

Word cloud: 

The extraction of words the tweets based on the 
analysis of the hashtag (the words that start with 
hash). The content attribute of the hashtag was used 
for map function and reduce function is used to get 
the respective counts of different hashtags. The Shiny 
interface allows the user to select the maximum 
number of words maximum number of words as well 
as the minimum frequency of words used for 
visualization. 


Most used words associated with the hashtag 


IUIlyi atuiauuiis iiayaiai iu 
fniinfrv s ^H on 9 nctefS assembly 

VI 

aiassamia S t ^end 


meghalaya 

f)aggedwinsbatchgOVt _ cac | res ta(( r ° 
^surrender 5 Ebusvfair^ 
^hearty P0*e' 3 ' y 

Fig 3: Wordcloud 


Top 20 tweeters: 

In this visualization type, the top 20 tweets are 
displayed. A bar plot is created here by using the bar 
plot function which takes height as a vector a vector. 


A chance gradually shows the opinions of humans. 


Sentiment Analysis 


Positive 



Negative 

Fig 5. Pie Chart 

Tabular Form: 

On the scale of 5, the sentiments are depicted and 
presented here. A tabular column consists of a 
positive score and negative score of the tweets. 


RT @ani_digital: Last batch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 
Experience of this haven of peace at Pala #Resort in #Cherrapunjee #Sohra #Meghalaya /sTgSGsaTcG 
#Meghalaya: Notorious criminal escapes police custody once again from #Tura Civil Hospital /5hlcipdFL2 

ve launched our campaign today for 


re. @HDFC_B; 


k is undertaking school #sanitation programme in 


ch of GNLA cadres sui 
ch of GNLA cadres sui 
ch of GNLA cadres sui 


in #Meghalaya Read @ANI Story | /BomBewXN6E 
in #Meghalaya Read @ANI Story | /BomBewXN6E 
in #Meghalaya Read @ANI Story | /BomBewXN6E 
ive. @HDFC_Bank is undertaking school #sanitation programme in 

epared in Bamboo shoot. Found this beauty in #Meghalaya. #Travel #FoodPom /mn4QcTEUXK 

ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 

ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 

ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 

ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 

Masrur Ashraf - /p82YXRulLZ 

Masrur Ashraf - /mPHVA2W6cS 

ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 
ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 
ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 
ch of GNLA cadres surrender in #Meghalaya Read @ANI Story | /BomBewXN6E 


100.00 

100.00 

100.00 


100.00 

100.00 

100.00 

100.00 


Fig6. Tabular form 

Histogram: 

A histogram shows the score according to the 
positive, negative and neutral opinions. This model 
shows two histograms, one for positivity and other for 
negative based on the frequency of days. 


Histogram of Positive Sentiment 



Top 20 tweeters of hastag 


Histogram of Negative Sentiment 


!J_I I I M ! I I I i M 1 I I I It 

e 21 

Fig4. Top Tweets 



Fig7. Histogram 


Pie chart: 


CONCLUSION: 
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The main objective of this paper was to describe and 
design system for classifying the sentiments of the 
people over a period of time using their tweets. It was 
developed using R and RStudio Big data processing 
technologies. We developed a set of analytical 
representation which helps the user to identify the 
data and can gain insights from it. We took a set of 
visualizations, implemented in shiny web 
applications. 
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